Using a VOMmodel for reconstructing potential coding regions in EST sequences
نویسندگان
چکیده
This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOMmodels is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOMmodels are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposedmethod is found to be robust to random errors in these sequences.
منابع مشابه
Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences
One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model ...
متن کاملOrfPredictor: predicting protein-coding regions in EST-derived sequences
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The ...
متن کاملP-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis
Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...
متن کاملTargetFinder and Annotator: a Simple Approach for Finding Full-length Target cDNAs and for Annotating EST Sequences
In a large scale EST (expressed sequence tag) or cDNA sequencing project, it is often desirable to know whether the ESTs identify genes of interest and whether the cloned cDNAs include intact coding regions (are of full-length). In this work, we present two Perl tools, TargetFinder and Annotator. TargetFinder automates the identification of full-length cDNAs from assembled EST sequences includi...
متن کامل